depth value
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
seMCD: Sequentially implemented Monte Carlo depth computation with statistical guarantees
Gnettner, Felix, Kirch, Claudia, Nieto-Reyes, Alicia
Statistical depth functions provide center-outward orderings in spaces of dimension larger than one, where a natural ordering does not exist. The numerical evaluation of such depth functions can be computationally prohibitive, even for relatively low dimensions. We present a novel sequentially implemented Monte Carlo methodology for the computation of, theoretical and empirical, depth functions and related quantities (seMCD), that outputs an interval, a so-called seMCD-bucket, to which the quantity of interest belongs with a high probability prespecified by the user. For specific classes of depth functions, we adapt algorithms from sequential testing, providing finite-sample guarantees. For depth functions dependent on unknown distributions, we offer asymptotic guarantees using non-parametric statistical methods. In contrast to plain-vanilla Monte Carlo methodology the number of samples required in the algorithm is random but typically much smaller than standard choices suggested in the literature. The seMCD method can be applied to various depth functions, covering multivariate and functional spaces. We demonstrate the efficiency and reliability of our approach through empirical studies, highlighting its applicability in outlier or anomaly detection, classification, and depth region computation. In conclusion, the seMCD-algorithm can achieve accurate depth approximations with few Monte Carlo samples while maintaining rigorous statistical guarantees.
- Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > Spain > Cantabria (0.04)
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Daxberger, Erik, Wenzel, Nina, Griffiths, David, Gang, Haiming, Lazarow, Justin, Kohavi, Gefen, Kang, Kai, Eichner, Marcin, Yang, Yinfei, Dehghan, Afshin, Grasch, Peter
Multimodal large language models (MLLMs) excel at 2D visual understanding but remain limited in their ability to reason about 3D space. In this work, we leverage large-scale high-quality 3D scene data with open-set annotations to introduce 1) a novel supervised fine-tuning dataset and 2) a new evaluation benchmark, focused on indoor scenes. Our Cubify Anything VQA (CA-VQA) data covers diverse spatial tasks including spatial relationship prediction, metric size and distance estimation, and 3D grounding. We show that CA-VQA enables us to train MM-Spatial, a strong generalist MLLM that also achieves state-of-the-art performance on 3D spatial understanding benchmarks, including our own. We show how incorporating metric depth and multi-view inputs (provided in CA-VQA) can further improve 3D understanding, and demonstrate that data alone allows our model to achieve depth perception capabilities comparable to dedicated monocular depth estimation models. We will publish our SFT dataset and benchmark.
- South America > Brazil (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
Direct Sparse Odometry with Continuous 3D Gaussian Maps for Indoor Environments
Deng, Jie, Lang, Fengtian, Yuan, Zikang, Yang, Xin
Direct Sparse Odometry with Continuous 3D Gaussian Maps for Indoor Environments Jie Deng 1, Fengtian Lang 1, Zikang Y uan 2 and Xin Y ang 1 Abstract -- Accurate localization is essential for robotics and augmented reality applications such as autonomous navigation. Vision-based methods combining prior maps aim to integrate LiDAR-level accuracy with camera cost efficiency for robust pose estimation. Existing approaches, however, often depend on unreliable interpolation procedures when associating discrete point cloud maps with dense image pixels, which inevitably introduces depth errors and degrades pose estimation accuracy. We propose a monocular visual odometry framework utilizing a continuous 3D Gaussian map, which directly assigns geometrically consistent depth values to all extracted high-gradient points without interpolation. Evaluations on two public datasets demonstrate superior tracking accuracy compared to existing methods. We have released the source code of this work for the development of the community. I NTRODUCTION Visual odometry (VO)/visual-inertial odometry (VIO) is a crucial capability in a wide range of technologies, including robotics, unmanned aerial vehicles and mixed reality.
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
David Eigen, Christian Puhrsch, Rob Fergus
Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring integration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the overall scale. In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. We also apply a scale-invariant error to help measure depth relations rather than scale. By leveraging the raw datasets as large sources of training data, our method achieves state-of-the-art results on both NYU Depth and KITTI, and matches detailed depth boundaries without the need for superpixelation.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Journey into Automation: Image-Derived Pavement Texture Extraction and Evaluation
Lu, Bingjie, Dan, Han-Cheng, Zhang, Yichen, Huang, Zhetao
Mean texture depth (MTD) is pivotal in assessing the skid resistance of asphalt pavements and ensuring road safety. This study focuses on developing an automated system for extracting texture features and evaluating MTD based on pavement images. The contributions of this work are threefold: firstly, it proposes an economical method to acquire three-dimensional (3D) pavement texture data; secondly, it enhances 3D image processing techniques and formulates features that represent various aspects of texture; thirdly, it establishes multivariate prediction models that link these features with MTD values. Validation results demonstrate that the Gradient Boosting Tree (GBT) model achieves remarkable prediction stability and accuracy (R2 = 0.9858), and field tests indicate the superiority of the proposed method over other techniques, with relative errors below 10%. This method offers a comprehensive end-to-end solution for pavement quality evaluation, from images input to MTD predictions output.
- Asia > South Korea (0.04)
- Asia > China > Hunan Province (0.04)
- Information Technology (1.00)
- Energy (0.89)
- Transportation > Ground > Road (0.66)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
Multimodal Object Detection using Depth and Image Data for Manufacturing Parts
Mahjourian, Nazanin, Nguyen, Vinh
Manufacturing requires reliable object detection methods for precise picking and handling of diverse types of manufacturing parts and components. Traditional object detection methods utilize either only 2D images from cameras or 3D data from lidars or similar 3D sensors. However, each of these sensors have weaknesses and limitations. Cameras do not have depth perception and 3D sensors typically do not carry color information. These weaknesses can undermine the reliability and robustness of industrial manufacturing systems. To address these challenges, this work proposes a multi-sensor system combining an red-green-blue (RGB) camera and a 3D point cloud sensor. The two sensors are calibrated for precise alignment of the multimodal data captured from the two hardware devices. A novel multimodal object detection method is developed to process both RGB and depth data. This object detector is based on the Faster R-CNN baseline that was originally designed to process only camera images. The results show that the multimodal model significantly outperforms the depth-only and RGB-only baselines on established object detection metrics. More specifically, the multimodal model improves mAP by 13% and raises Mean Precision by 11.8% in comparison to the RGB-only baseline. Compared to the depth-only baseline, it improves mAP by 78% and raises Mean Precision by 57%. Hence, this method facilitates more reliable and robust object detection in service to smart manufacturing applications.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Michigan (0.04)